Goto

Collaborating Authors

 Harrison County


MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains

Xue, Leyan, Zhang, Changqing, Xue, Kecheng, Liu, Xiaohong, Wang, Guangyu, Han, Zongbo

arXiv.org Artificial Intelligence

Although multimodal fusion has made significant progress, its advancement is severely hindered by the lack of adequate evaluation benchmarks. Current fusion methods are typically evaluated on a small selection of public datasets, a limited scope that inadequately represents the complexity and diversity of real-world scenarios, potentially leading to biased evaluations. This issue presents a twofold challenge. On one hand, models may overfit to the biases of specific datasets, hindering their generalization to broader practical applications. On the other hand, the absence of a unified evaluation standard makes fair and objective comparisons between different fusion methods difficult. Consequently, a truly universal and high-performance fusion model has yet to emerge. To address these challenges, we have developed a large-scale, domain-adaptive benchmark for multimodal evaluation. This benchmark integrates over 30 datasets, encompassing 15 modalities and 20 predictive tasks across key application domains. To complement this, we have also developed an open-source, unified, and automated evaluation pipeline that includes standardized implementations of state-of-the-art models and diverse fusion paradigms. Leveraging this platform, we have conducted large-scale experiments, successfully establishing new performance baselines across multiple tasks. This work provides the academic community with a crucial platform for rigorous and reproducible assessment of multimodal models, aiming to propel the field of multimodal artificial intelligence to new heights.


Self-Supervised Learning-Based Path Planning and Obstacle Avoidance Using PPO and B-Splines in Unknown Environments

Shokouhi, Shahab, Oruc, Oguzhan, Thein, May-Win

arXiv.org Artificial Intelligence

This paper introduces SmartBSP, an advanced self-supervised learning framework for real-time path planning and obstacle avoidance in autonomous robotics navigating through complex environments. The proposed system integrates Proximal Policy Optimization (PPO) with Convolutional Neural Networks (CNN) and Actor-Critic architecture to process limited LIDAR inputs and compute spatial decision-making probabilities. The robot's perceptual field is discretized into a grid format, which the CNN analyzes to produce a spatial probability distribution. During the training process a nuanced cost function is minimized that accounts for path curvature, endpoint proximity, and obstacle avoidance. Simulations results in different scenarios validate the algorithm's resilience and adaptability across diverse operational scenarios. Subsequently, Real-time experiments, employing the Robot Operating System (ROS), were carried out to assess the efficacy of the proposed algorithm.


Encouraging Responsible Use of Generative AI in Education: A Reward-Based Learning Approach

Singh, Aditi, Ehtesham, Abul, Kumar, Saket, Gupta, Gaurav Kumar, Khoei, Tala Talaei

arXiv.org Artificial Intelligence

This research introduces an innovative mathematical learning approach that integrates generative AI to cultivate a structured learning rather than quick solution. Our method combines chatbot capabilities and generative AI to offer interactive problem-solving exercises, enhancing learning through a stepby-step approach for varied problems, advocating for the responsible use of AI in education. Our approach emphasizes that immediate answers from ChatGPT can impede real learning. We introduce a reward-based system that requires students to solve mathematical problems effectively to receive the final answer. This encourages a progressive learning path from basic to complex problems, rewarding mastery with final solutions. The goal is to transition students from seeking quick fixes to engaging actively in a comprehensive learning experience.


3D-Convolution Guided Spectral-Spatial Transformer for Hyperspectral Image Classification

Varahagiri, Shyam, Sinha, Aryaman, Dubey, Shiv Ram, Singh, Satish Kumar

arXiv.org Artificial Intelligence

In recent years, Vision Transformers (ViTs) have shown promising classification performance over Convolutional Neural Networks (CNNs) due to their self-attention mechanism. Many researchers have incorporated ViTs for Hyperspectral Image (HSI) classification. HSIs are characterised by narrow contiguous spectral bands, providing rich spectral data. Although ViTs excel with sequential data, they cannot extract spectral-spatial information like CNNs. Furthermore, to have high classification performance, there should be a strong interaction between the HSI token and the class (CLS) token. To solve these issues, we propose a 3D-Convolution guided Spectral-Spatial Transformer (3D-ConvSST) for HSI classification that utilizes a 3D-Convolution Guided Residual Module (CGRM) in-between encoders to "fuse" the local spatial and spectral information and to enhance the feature propagation. Furthermore, we forego the class token and instead apply Global Average Pooling, which effectively encodes more discriminative and pertinent high-level features for classification. Extensive experiments have been conducted on three public HSI datasets to show the superiority of the proposed model over state-of-the-art traditional, convolutional, and Transformer models. The code is available at https://github.com/ShyamVarahagiri/3D-ConvSST.


COLA: Characterizing and Optimizing the Tail Latency for Safe Level-4 Autonomous Vehicle Systems

Liu, Haolan, Wang, Zixuan, Zhao, Jishen

arXiv.org Artificial Intelligence

Autonomous vehicles (AVs) are envisioned to revolutionize our life by providing safe, relaxing, and convenient ground transportation. The computing systems in such vehicles are required to interpret various sensor data and generate responses to the environment in a timely manner to ensure driving safety. However, such timing-related safety requirements are largely unexplored in prior works. In this paper, we conduct a systematic study to understand the timing requirements of AV systems. We focus on investigating and mitigating the sources of tail latency in Level-4 AV computing systems. We observe that the performance of AV algorithms is not uniformly distributed -- instead, the latency is susceptible to vehicle environment fluctuations, such as traffic density. This contributes to burst computation and memory access in response to the traffic, and further leads to tail latency in the system. Furthermore, we observe that tail latency also comes from a mismatch between the pre-configured AV computation pipeline and the dynamic latency requirements in real-world driving scenarios. Based on these observations, we propose a set of system designs to mitigate AV tail latency. We demonstrate our design on widely-used industrial Level-4 AV systems, Baidu Apollo and Autoware. The evaluation shows that our design achieves 1.65 X improvement over the worst-case latency and 1.3 X over the average latency, and avoids 93% of accidents on Apollo.


Augmented Language Models: a Survey

Mialon, Grégoire, Dessì, Roberto, Lomeli, Maria, Nalmpantis, Christoforos, Pasunuru, Ram, Raileanu, Roberta, Rozière, Baptiste, Schick, Timo, Dwivedi-Yu, Jane, Celikyilmaz, Asli, Grave, Edouard, LeCun, Yann, Scialom, Thomas

arXiv.org Artificial Intelligence

This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in calling external modules such as a code interpreter. LMs can leverage these augmentations separately or in combination via heuristics, or learn to do so from demonstrations. While adhering to a standard missing tokens prediction objective, such augmented LMs can use various, possibly non-parametric external modules to expand their context processing ability, thus departing from the pure language modeling paradigm. We therefore refer to them as Augmented Language Models (ALMs). The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks and even outperforming most regular LMs on several benchmarks. In this work, after reviewing current advance in ALMs, we conclude that this new research direction has the potential to address common limitations of traditional LMs such as interpretability, consistency, and scalability issues.


Momentum Decoding: Open-ended Text Generation As Graph Exploration

Lan, Tian, Su, Yixuan, Liu, Shuhang, Huang, Heyan, Mao, Xian-Ling

arXiv.org Artificial Intelligence

Open-ended text generation with autoregressive language models (LMs) is one of the core tasks in natural language processing. However, maximization-based decoding methods (e.g., greedy/beam search) often lead to the degeneration problem, i.e., the generated text is unnatural and contains undesirable repetitions. Existing solutions to this problem either introduce randomness prone to incoherence or require a look-ahead mechanism that demands extra computational overhead. In this study, we formulate open-ended text generation from a new perspective, i.e., we view it as an exploration process within a directed graph. Thereby, we understand the phenomenon of degeneration as circular loops within the directed graph. Based on our formulation, we propose a novel decoding method -- \textit{momentum decoding} -- which encourages the LM to \textit{greedily} explore new nodes outside the current graph. Meanwhile, it also allows the LM to return to the existing nodes with a momentum downgraded by a pre-defined resistance function. We extensively test our approach on three benchmarks from different domains through automatic and human evaluations. The results show that momentum decoding performs comparably with the current state of the art while enjoying notably improved inference speed and computation FLOPs. Furthermore, we conduct a detailed analysis to reveal the merits and inner workings of our approach. Our codes and other related resources are publicly available at https://github.com/gmftbyGMFTBY/MomentumDecoding.


Shared Manifold Learning Using a Triplet Network for Multiple Sensor Translation and Fusion with Missing Data

Dutt, Aditya, Zare, Alina, Gader, Paul

arXiv.org Artificial Intelligence

Abstract--Heterogeneous data fusion can enhance the robustness and accuracy of an algorithm on a given task. However, due to the difference in various modalities, aligning the sensors and embedding their information into discriminative and compact representations is challenging. In this paper, we propose a Contrastive learning based MultiModal Alignment Network (CoMMANet) to align data from different sensors into a shared and discriminative manifold where class information is preserved. The proposed architecture uses a multimodal triplet autoencoder to cluster the latent space in such a way that samples of the same classes from each heterogeneous modality are mapped close to each other. Since all the modalities exist in a shared manifold, a unified classification framework is proposed. A comparison made with other methods demonstrates the superiority of this method. This method is also called decision fusion. In the context of a neural network, these outstanding results on tasks like land-use and land-cover representations are generated by the convolutional layers classification (LULC) [1] [2], mineral exploration [3] [4] and fused gradually to form a shared representation [5], urban planning [6], biodiversity conservation [7], sentiment layer. In Fusion methods can be classified into two groups: concatenation and alignment-based methods. Personal use of this material is permitted. To increase the interpretability learn spatial information by using a structured morphological of fusion models, Hong et al. [27] proposed a element of predefined size and shape. They proposed a graphbased shared and specific feature learning (S2FL) that is capable of model to couple the dimension reduction and fusion of decomposing data into modality-shared and modality-specific information. However, using this method, the cloud-covered components, which enables a better information blending of regions are not accurately classified because the morphological multiple heterogeneous modalities.


PEER: A Collaborative Language Model

Schick, Timo, Dwivedi-Yu, Jane, Jiang, Zhengbao, Petroni, Fabio, Lewis, Patrick, Izacard, Gautier, You, Qingfei, Nalmpantis, Christoforos, Grave, Edouard, Riedel, Sebastian

arXiv.org Artificial Intelligence

Textual content is often the output of a collaborative writing process: We start with an initial draft, ask for suggestions, and repeatedly make changes. Agnostic of this process, today's language models are trained to generate only the final result. As a consequence, they lack several abilities crucial for collaborative writing: They are unable to update existing texts, difficult to control and incapable of verbally planning or explaining their actions. To address these shortcomings, we introduce PEER, a collaborative language model that is trained to imitate the entire writing process itself: PEER can write drafts, add suggestions, propose edits and provide explanations for its actions. Crucially, we train multiple instances of PEER able to infill various parts of the writing process, enabling the use of self-training techniques for increasing the quality, amount and diversity of training data. This unlocks PEER's full potential by making it applicable in domains for which no edit histories are available and improving its ability to follow instructions, to write useful comments, and to explain its actions. We show that PEER achieves strong performance across various domains and editing tasks.


Robust Semi-Supervised Classification using GANs with Self-Organizing Maps

Fick, Ronald, Gader, Paul, Zare, Alina

arXiv.org Artificial Intelligence

Generative adversarial networks (GANs) have shown tremendous promise in learning to generate data and effective at aiding semi-supervised classification. However, to this point, semi-supervised GAN methods make the assumption that the unlabeled data set contains only samples of the joint distribution of the classes of interest, referred to as inliers. Consequently, when presented with a sample from other distributions, referred to as outliers, GANs perform poorly at determining that it is not qualified to make a decision on the sample. The problem of discriminating outliers from inliers while maintaining classification accuracy is referred to here as the DOIC problem. In this work, we describe an architecture that combines self-organizing maps (SOMs) with SS-GANS with the goal of mitigating the DOIC problem and experimental results indicating that the architecture achieves the goal. Multiple experiments were conducted on hyperspectral image data sets. The SS-GANS performed slightly better than supervised GANS on classification problems with and without the SOM. Incorporating the SOMs into the SS-GANs and the supervised GANS led to substantially mitigation of the DOIC problem when compared to SS-GANS and GANs without the SOMs. Furthermore, the SS-GANS performed much better than GANS on the DOIC problem, even without the SOMs.